ABSTRACT
Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow.
In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization).
Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM’s vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.
- 2022. Auto-Vectorization in GCC. https://gcc.gnu.org/projects/tree-ssa/vectorization.htmlGoogle Scholar
- 2022. Auto-Vectorization in LLVM. https://llvm.org/docs/Vectorizers.htmlGoogle Scholar
- 2022. llvm::TargetTransformInfo Class Reference. https://llvm.org/doxygen/classllvm_1_1TargetTransformInfo.htmlGoogle Scholar
- Randy Allen and Ken Kennedy. 1987. Automatic Translation of FORTRAN Programs to Vector Form. ACM Transactions on Programming Languages and Systems.Google Scholar
- Randy Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Symposium on Principles of Programming Languages.Google ScholarDigital Library
- Sara S. Baghsorkhi, Nalini Vasudevan, and Youfeng Wu. 2016. FlexVec: Auto-vectorization for Irregular Loops. In Programming Language Design and Implementation.Google Scholar
- Bob Blainey, Christopher Barton, and José Nelson Amaral. 2002. Removing impediments to loop fusion through code transformations. In International Workshop on Languages and Compilers for Parallel Computing.Google Scholar
- David Callahan, Jack J Dongarra, and David Levine. 1988. Vectorizing Compilers: A Test Suite and Results. In ACM/IEEE Conference on Supercomputing.Google Scholar
- Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Transactions on Programming Languages and Systems.Google Scholar
- Tobias Grosser, Armin Größ linger, and Christian Lengauer. 2012. Polly – Performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters.Google Scholar
- Khronos Group. 2009. OpenCL 1.0 Specification. http://khronos.org/registry/cl/specs/opencl-1.0.pdfGoogle Scholar
- Ralf Karrenberg and Sebastian Hack. 2011. Whole Function Vectorization. In International Symposium on Code Generation and Optimization.Google Scholar
- Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In International Workshop on Languages and Compilers for Parallel Computing. 301–320.Google Scholar
- Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Programming Language Design and Implementation.Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization.Google Scholar
- Jun Liu, Yuanrui Zhang, Ohyoung Jang, Wei Ding, and Mahmut Kandemir. 2012. A Compiler Framework for Extracting Superword Level Parallelism. In Programming Language Design and Implementation.Google Scholar
- Charith Mendis and Saman Amarasinghe. 2018. goSLP: Globally Optimized Superword Level Parallelism Framework. Proceedings of the ACM on Programming Languages.Google ScholarDigital Library
- Simon Moll and Sebastian Hack. 2018. Partial Control-Flow Linearization. In Programming Language Design and Implementation.Google Scholar
- Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of Interleaved Data for SIMD. In Programming Language Design and Implementation.Google Scholar
- Dorit Nuzman and Ayal Zaks. 2008. Outer-loop Vectorization: Revisited for Short SIMD Architectures. In International Conference on Parallel Architectures and Compilation Techniques.Google ScholarDigital Library
- Karl J. Ottenstein, Robert A. Ballance, and Arthur B. MacCabe. 1990. The Program Dependence Web: A Representation Supporting Control-, Data-, and Demand-Driven Interpretation of Imperative Languages. In Programming Language Design and Implementation.Google Scholar
- Joseph CH Park and Mike Schlansker. 1991. On predicated execution.Google Scholar
- Matt Pharr and William R. Mark. 2012. ispc: A SPMD Compiler for High-Performance CPU Programming. In Innovative Parallel Computing.Google Scholar
- Vasileios Porpodas and Timothy M. Jones. 2015. Throttling Automatic Vectorization: When Less is More. In Conference on Parallel Architecture and Compilation.Google Scholar
- Vasileios Porpodas, Alberto Magni, and Timothy M. Jones. 2015. PSLP: Padded SLP Automatic Vectorization. In International Symposium on Code Generation and Optimization.Google Scholar
- Vasileios Porpodas, Rodrigo CO Rocha, and Luís FW Góes. 2018. VW-SLP: auto-vectorization with adaptive vector width. In International Conference on Parallel Architectures and Compilation Techniques.Google ScholarDigital Library
- Vasileios Porpodas, Rodrigo C. O. Rocha, Evgueni Brevnov, Luís F. W. Góes, and Timothy Mattson. 2019. Super-Node SLP: Optimized Vectorization for Code Sequences Containing Operators and Their Inverse Elements. In International Symposium on Code Generation and Optimization.Google Scholar
- Louis-Noël Pouchet. 2021. PolyBench/C: the polyhedral benchmark suite. https://web.cse.ohio-state.edu/ pouchet.2/software/polybench/Google Scholar
- Rodrigo C. O. Rocha, Vasileios Porpodas, Pavlos Petoumenos, Luís F. W. Góes, Zheng Wang, Murray Cole, and Hugh Leather. 2020. Vectorization-Aware Loop Unrolling with Seed Forwarding. In International Conference on Compiler Construction.Google Scholar
- Ira Rosen, Dorit Nuzman, and Ayal Zaks. 2007. Loop-aware SLP in GCC. In GCC Developers Summit.Google Scholar
- Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-Level Parallelism in the Presence of Control Flow. In International Symposium on Code Generation and Optimization.Google Scholar
- Jean-Baptiste Tristan, Paul Govereau, and Greg Morrisett. 2011. Evaluating Value-Graph Translation Validation for LLVM. In Programming Language Design and Implementation.Google Scholar
- Peng Tu and David Padua. 1995. Efficient Building and Placing of Gating Functions. In Programming Language Design and Implementation.Google Scholar
Index Terms
- All you need is superword-level parallelism: systematic control-flow vectorization with SLP
Recommendations
VeGen: a vectorizer generator for SIMD and beyond
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsVector instructions are ubiquitous in modern processors. Traditional compiler auto-vectorization techniques have focused on targeting single instruction multiple data (SIMD) instructions. However, these auto-vectorization techniques are not sufficiently ...
Exploiting superword level parallelism with multimedia instruction sets
Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, ...
Vectorization-aware loop unrolling with seed forwarding
CC 2020: Proceedings of the 29th International Conference on Compiler ConstructionLoop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into ...
Comments